Code Generation

Source to Source Translation

In the code generation phase, the compiler performs a traversal of the parse tree and emits target code
See examples/source2source

Intro to Linux Assembler Programming

See examples/codegen/hello for a simple example of a Linux assembler program that displays some output.

One way to learn is by examining the assembly output of the gcc compiler. Create a small C test program, and invoke gcc with the -S switch:

gcc -S myprog.c

This produces myprog.s containing the generated assembly code.

Some notes about writing Linux assembly code:

Precede your data declarations with the .data directive
Precede executable instructions with the .text directive
Comment with #
Define numeric constants using
```
constant_name = value
```
Ex:
```
STDOUT = 0
```
Labels start at the beginning of the line and have a : at the end. ex:
```
hello:
```

AT&T Syntax

The syntax used by Unix assemblers, “AT&T syntax”, differs from the “Intel Syntax” used by Windows assemblers, in several ways:

The source operand comes first, followed by the destination operand
Register names are prefixed with %
Instruction names have suffixes to indicate the operand size: ‘l’ for long (32 bits), ‘w’ for word (16 bits), ‘b’ for byte (8 bits). To keep matters simple, your code will deal strictly with 32 bit operands.
Pointer dereference uses ( ) instead of [ ]

Compare Intel syntax to the AT&T syntax:

AT&T Syntax	Intel Syntax
`movl $1, %eax`	`mov eax, 1`
`movl (%ebx), %eax`	`mov eax, [ebx]`

Here’s a Hello World assembler program:

# hello.s -- Hello World in Linux Assembler

STDOUT = 1     # define a constant

.data
hello:
        .string "hello world\n"

.text
.global main
main:
        # Call write(STDOUT, "hello world\n", 12)
        pushl $12
        pushl $hello
        pushl $STDOUT
        call    write
        addl   $12, %esp

        # Call exit(0)
        pushl $0
        call    exit  # no return...

	    # or ...
        #movl  $1, %eax    # 1 is the number of the exit system call
        #movl  $0, %ebx    # 0 is the parameter for exit
        #int     $0x80        

To assemble this program, simply use gcc:

gcc hello.s -ohello

gcc invokes the assembler and linker to produce the resulting executable.

Linux I/O

Our programs must have a way to do I/O. Since our code will run on the Linux platform, it will perform Linux system calls to do the I/O.

The Linux I/O system calls are fairly simple – read() gets input from an open file, and write() produces output to a file. You supply a file descriptor to specify which file to write to / read from.

In Linux, use the man command to get information on these functions:

man 2 read
man 2 write

You’ll be reading from stdin (file descriptor 0), and writing to stdout (file descriptor 1).

CpS 450 Language Translation Systems

Code Generation

Source to Source Translation

Intro to Linux Assembler Programming

AT&T Syntax

Linux I/O